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AN  EVALUATION  OF  SPEECH  COMPRESSION  TECHNIQUES 
ABSTRACT 

The  results  of  previous  technical  reports,  prepared  under  this  contract, 
are  summarized  and  the  results  of  tests  of  various  recently  developed 
speech  compression  systems  are  presented  and  analyzed. 

The  results  obtained  from  an  analysis  of  published  data  and  theory  on 
techniques  for  the  compression  of  the  bandwidth  of  speech  and  new  data 
collected  on  this  matter  dxirlng  the  course  of  this  contract  can  be 
summarized  as  follows s 

a.  Semi-vocoders,  operating  at  96OO  bits/sec,  and  channel 
vocoders,  at  2400  blts/sec,  will  provide  speech  of  adeqiiate 
Intelligibility  and  quality  for  most  military  comm\inlcatlons . 

The  voice  q\iallty  of  the  semi-vocoders  will  usually  be  somewhat 
superior  to  that  of  the  channel  vocoders. 

b.  Res\ilts  with  as  yet  Incomplete  systems  suggest  that  the 
Tarasoff-Daguet  technique  and  the  narrow-band  spectrum  sampling 
technique  will  ultimately  provide  speech  Intelligibility  and 
quality  comparable  to  that  of  the  semi-vocoder.  TOiese  systems 
will  also  reqiilre  about  96OO  blts/sec  channel  capacity  diorlng 
transmission. 

c.  Formant-tracking  vocoders  operating  at  about  1200  blts/sec 
can  probably  be  developed  to  the  point  where  they  will  provide 
speech  Intelligibility  comparable  to  that  from  the  channel 
vocoders  operating  at  2400  blts/sec.  At  the  present  time  formant¬ 
tracking  vocoders  operating  at  1000  blts/sec  do  not  provide  for 
adequate  speech  Intelligibility . 


Ill 


Report  No .  978 


Bolt  Beranek  and  Newman  Inc. 


d.  Transmitting  a  restricted  set  of  speech  "patterns/*  from  a 
larger  set  obtained  from  a  digitized  vocoder-type  of  speech 
analyzer-syntheslzerj  may  provide  a  reduction  In  the  Information 
transmission  rate  normally  required  by  that  peu'tlcular  vocoder. 
Although  this  "pattern"  matching  technique  Is  still  In  Its  early 
experimental  stages  It  appears  that  the  nonnal  bit  rate  for  a 
given  vocoder  system  may,  by  this  technique,  be  reduced  to  2/3 
and  possibly  to  1/2  Its  normal  magnitude. 
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1.  INTRODUCTION 

The  alms  set  forth  for  Contract  USAP  30 (602) -2235 j  "An  Evaluation 
of  Speech  Compression  Techniques,"  were: 

(1)  to  detemlne  the  relative  strength  and  weakness  of 
presently  available  speech  compression  techniques, 

(2)  to  evaluate  these  techniques  as  to  possible  futxire 
potential  and  expansion, 

(3)  to  determine  the  best  method  for  equipment  development 
In  the  near  future,  and 

(4)  to  detennlne  the  best  areas  for  future  Intensive  research 
effort. 

The  results  of  major  efforts  to  meet  these  above  goals  have  been 
previously  reported: 

(1)  K.  N.  Stevens,  "Review  of  Existing  Speech  Compression 
Systems,"  RADC-TN-60-197,  October  I960,  and 

(2)  K.  N.  Stevens,  M.  H.  L.  Hecker  and  K,  D.  Kryter,  "An 
Evaluation  of  Speech  Compression  Systems,"  RADC-TDR -6.2-171, 

1  March  1962. 

Tlaese  two  reports  present  in  detail  a  critical  analysis  of  the  state- 
of-art  to  date  In  both  engineering  techniques  and  theory  for  reducing 
the  frequency  bandwidth  normally  required  to  transmit  Intelligible 
speech.  The  aforementioned  reports  also  discuss  the  best  probable  paths 
to  follow  in  futvire  research  and  development  In  this  area. 
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The  results  of  a  comprehensive  speech  testing  program  performed 
\mder  this  contract  on  various  speech  compression  systems  were  a 
major  aid  to  o\ir  evaluation  of  speech  compression  techniques.  This 
speech  testing  was  conducted  In  a  way  that  permitted  direct  and  ac¬ 
curate  intercomparisons  among  various  speech  compression  techniques 
with  respect  to  PB  word  Intelligibility,  talker  recognition,  general 
speech  quality  and  last,  but  not  least,  confusions  made  amongst 
so-called  nonsense  syllables.  These  latter  tests  provided  interest¬ 
ing  "diagnostic"  Information  about  the  weaknesses  and  strengths  of 
the  various  speech  compression  systems. 

Subsequent  to  the  completion  of  RADC-TDR-62-171  In  March  1962,  a 
survey  was  made  of  U.  S.  Air  Force  research  and  development  projects 
that  might  provide  additional  speech  compression  equipment  that  would 
be  completed  by  1  January  1963.  It  was  proposed,  as  a  follow-up  to 
the  work  already  accomplished,  that  these  new  speech  compression  systems 
be  tested  with  PB  intelligibility  tests  In  a  manner  that  would  permit 
comparison  of  the  results  with  those  reported  In  TDR-62-171.  It  was 
also  proposed  that  during  this  period  (March  1962  to  January  1963) 
the  contractor  would  explore  the  use  of  some  "peak-picking  (formant¬ 
tracking)  technique"  In  conjtinctlon  with  the  so-called  narrow-band 
spectrum  sampling  system.  The  termination  date  of  the  subject  contract 
was  extended  to  March  1963  so  that  these  proposed  tasks  could  be 
prosecuted. 

In  the  report  to  follow  we  wills  (l)  briefly  s\jmmarlze  a  few  selected 
highlights  of  the  previous  technical  notes  and  reports  Issued  under 
this  contract;  (2)  give  the  test  resiilts  for  speech  bandwidth  com¬ 
pression  systems  that  have  been  developed  In  the  past  six  months  or  so 
and  which  we  were  able  to  test  In  December  1962  and  January  1963;  and 
(3)  present  a  description  of  the  efforts  made  to  combine  a  peak-picking/ 
formant-tracking  system  with  the  spectrum  sampling  (narrow-band)  system. 
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2.  SUMMARY  OP  PREVIOUS  TECHNICAL  REPORTS 

2.1  RADC-TN“60-197j  Review  of  Existing  Speech  Compression  Systems 

In  this  report,  K.  N.  Stevens  summarizes  in  a  chart  (see  Pig.  l)  his 
analysis  of  speech  Intelligibility  test  results  of  various  speech 
compression  systems  obtained  up  to  i960.  This  chart  relates  the 
channel  capacity  required  for  speech  transmission  in  blts-per-second 
to  percent  PB  word  intelligibility  scores. 

Although  PB  word  scores  cannot  be  used  in  an  "absolute”  sense  inasmuch 
as  they  are  strongly  influenced  by  the  ability  of  a  partlciilar  group 
of  listeners  and  talkers,  as  well  as  by  the  amoxint  of  training  re¬ 
ceived  by  the  listeners  on  a  particular  system,  the  reader  may  be 
interested  in  comparing  the  predictions  made  by  Stevens  on  the  basis 
of  previous  results  from  the  literature  and  the  resxilts  obtained  in 
our  studies  as  reported  in  TDR-62-17I  and  Section  3  of  this  report.  It 
might  be  noted  that  in  no  case  did  we  find  In  our  tests  a  system  that 
exceeded  the  estimates  plotted  by  Stevens  In  Pig.  1.  In  general,  how¬ 
ever,  the  test  resxilts  agreed  very  well  with  the  "predictions"  and 
extrapolations  given  in  Fig.  1. 

2.2  RADC-TDR-6 2-171 j  An  Evalxiatlon  of  Speech  Compression  Systems 


TDR-62-171  is  a  lengthy  docunent,  II5  pages  plus  appendices.  The 
following  speech  compression  systems  were  tested  In  the  course  of  the 
studies  presented  in  TDR-62-17I: 

(1)  Reference  (low-pass)  system.  This  system  consists  of  a 
Spencer-Kennedy  Model  302  electronic  filter,  set  for  1500  cps 
low-pass  operation  with  a  characteristic  slope  of  -36  dB/octave. 
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(2)  Channel  vocoder.  Two  systems  were  tested:  Model  HY’-2,  Phllco 
Company,  covirtesy  of  the  U.  S.  National  Security  Agency,  and 
Model  HC-135j  Hughes  Aircraft  Company,  Communications  Division. 

Each  vocoder  has  a  2400  blts/sec  digital  output  and  an  estimated 
400  cps  analog  bandwidth, 

(3)  Semi-vocoder.  General  Dynamics  Corporation,  Stromberg- 
Carlson  Division.  This  "base-band"  vocoder  has  an  estimated 
analog  bandwidth  of  900  cps. 

(4)  Formant  vocoder,  Melpar,  Inc,  The  formant-tracking  vocoder 
tested  produced  a  digital  Information  stream  of  1000  blts/sec; 
the  bandwidth  for  analog  operation  Is  approximately  l40  cps, 

(5)  Spectrum  sampling  (narrow-band)  system.  Bolt  Beranek  and 
Newman  Inc.  The  analog  bandwidth  utilization  Is  800  cps, 

(6)  Tarasoff-Daguet  system.  Courtesy  of  U,  S.  Array  Signal  Research 
and  Development  Agency,  The  estimated  analog  bandwidth  Is  approx¬ 
imately  1000  cps. 

The  speech  compression  systems  listed  above  were  subjected  to  several 
types  of  tests.  These  tests  were  designed  to  measure: 

(1)  the  Intelligibility  of  phonetically  balanced  (PB)  words, 

(2)  the  Intelligibility  of  nonsense  syllables,  with  emphasis  on 
the  confusions  made, 

(3)  the  general  quality  of  the  processed  signal. 
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(4)  the  accviracy  with  which  listeners  can  recognize  a  given 
talker  out  of  a  small  group  of  talkers,  and 

(5)  the  comprehension  of  continuous  speech  as  a  function  of  the 
degree  of  noise  Interference. 

Figure  2,  taken  from  TDR-62-I7I,  shows  the  resxilts  of  average  scores 
obtained  on  PB  word  tests  for  male  talkers  on  the  various  speech  com¬ 
pression  systems.  This  figure  Is  represented  as  an  aid  In  making 
comparisons  with  the  speech  compression  system  chart  presented  In 
Fig.  1  and  with  the  test  results  given  In  Section  3  of  this  report. 

There  Is  much  additional  Information  available  In  TDR-62-171  of  In¬ 
terest  to  research  and  engineering  personnel  working  In  the  field  of 
bandwidth  compression  systems.  For  our  present  summary  we  believe 
that  the  data  presented  In  Fig.  2  will  suffice.  It  can  be  noted,  how¬ 
ever,  that  the  general  rank  ordering  to  be  found  In  TDR-62-171  of  the 
various  systems  by  the  different  types  of  speech  tests  (talker  recog¬ 
nition,  general  quality,  etc.)  were  not  appreciably  different  from  the 
rank  ordering  found  for  these  systems  with  the  PB  word  tests. 

It  was  concluded  from  these  tests  that  channel  vocoders  operating  at 
2400  blts/sec  and  semi-vocoders  at  about  9600  blts/sec  would  provide 
adequate  Intelligibility  and  quality  for  moat  military  commixnlcatlons 
but  that  formant-tracking  vocoders  (operated  at  about  1000  blts/sec) 
require  considerable  Improvement  before  they  can  be  considered  satis¬ 
factory. 
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3.  PB  WORD  TESTS  OP  "NEW"  SYSTEMS 

A  survey  of  Air  Force  contractors  working  on  speech  communication 
systems  revealed  that  the  following  speech  compression  devices  being 
developed  In  this  country  were  ready  for  testing  by  January  1963 J 

(1)  Semi-vocoder;  General  Dynamics  Corporation,  Stromberg- 
Carlson  Division. 

(2)  Semi -vocoder s  Texas  Instrument  Company. 

(3)  Cliannel  vocoders  Texas  Instrument  Company. 

(4)  Channel  vocoder:  General  Dynamics  Corporation, 

Stromberg-Carlson  Division. 

(5)  Channel  vocoder;  Air  Force  Cambridge  Research 
Laboratory  (APCRL). 

(6)  Pattern  matching  systems  AFCRL 

Accordingly,  a  contractor  representative  visited  the  above-mentioned 
organizations  and  obtained  recordings  of  pre-recorded  PB  word  tests  as 
they  were  played  through  the  various  speech  compression  devices.  The 
resulting  recordings  were  then  administered  via  Telephonies  TDH-39 
earphones  to  a  crew  of  6  trained  listeners  (college  students)  In 
January  I963.  The  tests  were  conducted  under  quiet  listening  conditions. 

In  addition  to  the  test  recordings  obtained  with  the  aforementioned 
systems  additional  PB  word  tests  were  administered  to  the  listening  crew 
over  the  so-called  reference  system  and  the  Phllco  HY-2  channel -vocoder 
(see  under  Section  2  for  descriptions  of  these  systems).  The  tests  with 
these  latter  two  systems  were  Included  to  determine  whether  this  new 
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listening  crew  was  more  or  less  proficient  than  the  crew  used  for  the 
tests  reported  in  TDR-62-171.  The  test  restilts  are  presented  in  Pigs. 

3  and  4. 

3.1  Discussion  of  Results 

Statistical  significance  of  differences.  The  grand  averages  given 
as  the  circles  in  Pigs.  3  and  4  represent  averages  of  foxjr  50-word 
PB  tests.  Statistical  analysis  of  PB  word  tests  show  that  data  points 
based  on  200  PB  words  that  differ  by  5/^  points  are  significantly  dif¬ 
ferent  at  the  995^  level  of  confidence;  that  is,  a  difference  of  5 
percentage  points  or  greater  between  two  systems  or  test  conditions 
would  be  expected  99  times  in  a  hundred  repetitions  of  the  same  ex¬ 
periment.  This  general  r^lle  can  be  applied  in  estimating  the  possible 
significance  of  average  differences  among  the  results  obtained  with 
the  various  systems. 

Some  of  the  systems  tested  previously,  with  results  reported  in  TDR- 
62-171,  were  re-tested  along  with  these  newer  systems.  The  previous 
results,  as  well  as  the  present  ones,  are  shown  on  Pigs.  3  and  4.  The 
average  difference  between  the  grand  averages  for  the  systems  that 
were  tested  in  both  experiments  is  3-6  percentage  points,  with  the 
difference  in  favor  of  the  first  study.  If  one  wished  to  compare  one 
of  the  new  systems  with  one  of  those  tested  previously,  but  not  re-tested 
in  the  present  study,  it  would  be  reasonable  to  add  about  3«6  percentage 
points  to  these  new  scores  to  equate  the  results  for  an  apparent  dif¬ 
ference  in  test  crew  proficiency.  After  making  such  an  adjustment,  the 
statistic  (55^  points)  suggested  in  the  previous  paragraph  can  be  ap¬ 
plied  to  the  remaining  difference,  if  any,  to  test  for  statistical 
significance. 
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Absolute  scores.  During  the  course  of  our  experimentation  with 
various  speech  compression  systems  It  became  obvious  that  with  s\.if- 
flclent  practice  a  crew  of  listeners  could  become  quite  proficient 
In  Interpreting  the  speech  coming  from  a  given  system.  For  this 
reason.  It  was  necessary  to  use  an  experimental  design  that  provided 
the  test  crew  with  about  an  equal  amount  of  listening  experience  for 
each  of  the  bandwidth  compression  systems  being  evaluated.  (The 
crew  was  given  an  Initial  20  hours  of  training  on  the  PB  word  testa 
with  a  broad-band  speech  system  prior  to  the  experiments  proper  to 
overcome  the  Initial  learning  of  the  test  materials  and  procedures 
usually  encountered  with  a  test  crew  of  listeners.) 

As  a  result  of  the  experimental  procedures  followed,  the  scores  we 
obtained  of  any  given  speech  compression  system  are  lower  than  the 
scores  that  might  be  obtained  for  that  system  with  continued  concen¬ 
trated  training  of  a  group  of  listeners  on  only  that  or  similar  speech 
processing  systems.  However,  our  results  provide,  we  believe,  a  valid 
basis  for  making  comparisons  among  the  various  speech  compression 
systems  that  were  tested. 

It  is  our  subjective  Judgment  that  any  system  that  would  not  score 
about  805^  correct  on  PB  word  tests  with  a  crew  of  listeners  (such  as 
used  In  these  experiments)  who  were  reasonably  well  trained  on  PB  word 
tests  but  who  heretofore  had  had  no  appreciable  experience  with  a 
particular  speech  processing  technique  or  system  would  be  considered 
vinacceptable  by  the  average  user.  The  "acceptability"  of  a  speech 
communication  system  Is,  however,  a  complex  question  that  cannot  be 
answered  on  the  basis  of  Intelligibility  tests  alone,  nor  in  any  ab¬ 
solute  sense.  What  is  acceptable  In  some  situations  and  by  some  users 
would  be  rated  as  unsatisfactory  In  other  situations  or  by  other  users. 
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FIG.  3  AVERAGE  SCORES  ON  PB  WORD  TESTS.  TWO 
50  WORD  TESTS  BY  EACH  OF  TWO  MALE 
TALKERS  IN  QUIET. 


FIG.  4  AVERAGE  SCORES  ON  PB  WORD  TESTS.  TWO 
50  WORD  TESTS  BY  EACH  OF  TWO  FEMALE 
TALKERS  IN  QUIET.  DYNAMIC  MICROPHONE. 
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3.1.1  Semi -vocoders  --  male  talkers 

The  results  obtained  with  the  semi-vocoders  are  worthy  of  several 
comments. 

The  semi-vocoder  built  by  Stromberg-Carlson  scored  appreciably  lower 
In  these  new  tests  than  In  the  previous  ones,  85^^  vs  70$^  (see  Pigs. 

2  and  3).  Presiimably  the  major  difference  between  the  two  tests  lies 
In  the  fact  that  this  serai-vocoder  when  first  tested  (Plg.  2)  was 
operated  In  a  completely  analog  mode  whereas  for  the  second  tests 
(Plg.  3)  the  Information  from  the  vocoder  channels  was  digitized  dur¬ 
ing  Its  processing  by  the  system. 

It  will  be  recalled  that  a  similar  degradation  resulting  from  switch¬ 
ing  from  analog  to  digital  mode  was  found  In  the  Hughes  vocoder  (see 
TDR-62-I7I).  Although  we  have  no  other  comparative  test  scores  to 
report  that  provide  a  direct  evaluation  of  the  effects  of  analog-to- 
dlgltal  conversion.  It  Is  apparent  from  these  examples  as  well  as 
other  more  casual  tests  we  have  had  the  opportunity  to  make  on  speech 
compression  systems  that  the  number  of  bits  required  to  transmit  In 
digital  form  the  analog  Information  Involved  Is  often  greater  than 
anticipated. 

For  this  reason.  Intelligibility  test  scores  obtained  with  systems 
that  are  similar,  except  that  one  Is  operated  In  tlie  digital  mode  and 
the  other  In  the  analog  mode,  should  not  be  compared.  Therefore,  the 
channel  vocoder  built  by  Texas  Instrument  Company  cannot  be  Judged  as 
being  superior  to  the  other  channel  vocoders  tested,  although  the 
scores  were  generally  higher.  Unfortunately,  the  analog- to-dlgltal 
conversion  equipment  for  the  Texas  Instrument  Company  channel  vocoder 
was  not  completed  at  the  time  we  had  to  conduct  these  tests,  January 

1963. 
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It  was  observed  that  although  the  voice  quality  of  the  semi-vocoder 
made  by  Texas  Instrument  Company  was  excellent  the  higher  frequency 
bands  sounded  relatively  weak.  We  think  that  it  was  for  this  reason 
that  this  semi-vocoder  scored  slightly  lower  than  did  the  channel 
vocoder  also  made  by  The  Texas  Instrvunent  Company.  The  higher  fre¬ 
quency  bands  were  relatively  louder  in  this  channel  vocoder  than 
In  their  semi-vocoder. 

3.1.2  Channel  vocoders  --  male  talkers 

The  channel  vocoder  made  by  Stromberg-Carlson  Division  of  General 
Dynamics  Corporation  was  designed  for  use  with  a  standard  carbon 
(telephone)  microphone^  whereas  the  Phllco  Instrument  was  designed 
for  use  with  a  microphone ,  such  as  the  Altec -Lansing  66lA  dynamic, 
that  has  a  good  frequency  response  down  to  frequencies  as  low  as  50 
cps.  It  Is  interesting  to  note  that  the  Stromberg-Carlson  channel 
vocoder  performs  about  as  well.  If  not  slightly  better,  7^^  vs  TlJ^j 
with  ^he  carbon  as  It  does  with  the  dynamic  microphone;  on  the  other 
hand,  the  Phllco  channel  vocoder  scores  835^  (average  of  84$^  and  82$^, 
see  Pig.  3)  with  the  dynamic  microphone  and  735^  (average  of  755^  and 
715^)  with  the  carbon  microphone.  It  should  be  noted,  as  might  be 
expected,  that  with  a  carbon  microphone  the  speech  was  more  realistic 
(although  the  Intelligibility  was  about  equal)  on  the  Stromberg- 
Carlson  machine  than  on  the  Phllco,  but  with  a  dynamic  microphone 
the  Phllco  channel  vocoder  was  superior. 

3.1.3  Pattern  matching  system 

C.  P.  Smith*  designed  and  developed  a  system  In  which  the  most  frequent 
patterns  (taken  every  fraction  of  a  second)  to  be  found  during  normal 


*  C.  P.  Smith,  Speech  data  reduction:  voice  commianlcatlons  by  means  of 
binary  signals  at  rates  under  1000  bits  per  second.  U.  S.  Air  Force 
Cambridge  Research  Center,  ra-57-lll<  ASTIA  Doc.  AD117290,  Jan.  1957 
and  C.  P.  Smith,  a  method  for  speech  data  processing  by  means  of  a 
digital  computer.  Report  ERD-TM-58-IO3,  U.  S.  Air  Force  Cambridge 
Research  Center,  1958. 
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speech  at  the  output  of  a  digitized  channel  vocoder  are  cataloged 
and  stored  in  a  magnetic  memory  device.  The  number  of  patterns 
stored  can  be  limited  to  any  desired  number. 

Subsequent  to  this  initial  cataloging  and  storage,  the  patterns  forth¬ 
coming  from  a  digital  vocoder  processing  speech  are  compared  with  the 
patterns  stored  in  the  memory  of  a  computer.  The  code  for  those 
patterns  in  the  memory  that  come  closest  to  matching  those  of  the 
Incoming  signal  from  the  vocoder  is  transmitted  to  the  receiver 
portion  of  the  system  --  a  second  channel  vocoder  which  resynthesizes 
the  speech. 

For  these  tests  C.  P.  Smith  played  five  PB  word  test  tapes  (Nos. 

IE  through  5E,  1  male  talker)  through  his  pattern  analysis  and  stor¬ 
age  device.  He  limited  the  stored  information  for  spectral  patterns 
to  600  bits  and  that  for  pitch  to  25O  bits.  Following  this  compila¬ 
tion  of  most  frequently  occurring  patterns,  nine  PB  word  tests  (five 
of  the  same  used  for  making  the  pattern  analysis  and  catalog)  were 
played  through  the  entire  system  —  vocoder  analysis -pattern  matching- 
vocoder  resynthesls.  The  output  of  the  total  system  was  recorded.  In 
addition  foiir  of  the  PB  tests  were  played  through  the  channel  vocoders 
operated  "back-to-back"  at  2400  blts/sec. 

The  scores  obtained  with  these  recordings  are  given  in  Table  1.  It 
is  clear  from  the  results  in  Table  1  that  the  pattern  matching  tech¬ 
nique,  as  presently  operating,  degrades  PB  word  scores  by  a  significant 
amoxint  (average  of  735^  for  the  channel  vocoder  without  the  pattern 
matching  system  vs  48$^  with).  Attention  is  invited  to  the  fact  that 
list  IE  scored  best  on  both  systems. 
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Table  1 

Word  Scores  APCRL  Speech  Compression  Systems 

AFCRL  Channel  Vocoder  plus  Pattern  Matching  System  85O  bits/sec 
(250  bits  pitch,  600  bits  channel  spectra) 


PB  List  No. 

%  Correct 

IE 

66 

2£ 

56 

3E 

50 

4e 

43 

5E* 

Average  51/^ 

♦List  1  through  5E  used  for 
vocoder  patterns  stored  in 

compiling 
memory  of 

computer 

6e 

52 

7E 

50 

8e 

40 

9E 

28 

Average  4556 

Grand  Average  48$^ 

APCRL  Channel 

Vocoder  -  2400  blts/sec 

PB  List  No. 

Correct 

IE 

80 

2E 

70 

6E 

70 

7E 

70  Average  735^ 
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It  shovild  not  be  deduced  from  these  results  that  the  pattern  matching 
technique  does  not  have  promises 

a.  The  system  tested  Is  still  In  Its  early  experimental 
stages  and  can  undoubtedly  be  somewhat  Improved. 

b.  The  channel  vocoder  with  which  It  was  paired  provides 
only  marginally  satisfactory  Intelligibility  when  operated 

by  Itself.  There  was  probably  a  significant  saving  In  Informa¬ 
tion  rate  required  for  equal  Intelligibility.  At  the  present  time 
this  could  be  demonstrated  only  with  the  system  operating  with 
a  greater  pattern  storage  capacity,  or  with  the  pattern  matching 
system  operating  with  a  vocoder  that  provides  adequate  speech 
Intelligibility  at  a  lower  bit  rate  than  the  vocoder  presently 
used  with  the  APCRL  pattern  matching  system. 

3.1.4  PB  word  scores  —  female  talkers 

The  resxilts  obtained  with  the  female  talkers  were  presented  In 
Fig.  4.  Inasmuch  as  none  of  the  systems  tested  were  designed  for  use 
with  female  talkers  the  results  are,  perhaps,  only  of  academic  In¬ 
terest.  The  engineers  responsible  for  the  various  vocoders  tested 
pointed  out  that  the  pitch-tracking  circuits  used  In  their  Instruments 
were  designed  specifically  for  male  voices  but  could  probably  be 
modified  to  operate  more  effectively  with  female  voices. 

Nevertheless,  in  view  of  the  generally  lower  Intelligibility  scores 
for  the  female  voices  than  the  male,  even  on  the  reference  system, 
the  design  of  a  vocoder  that  will  provide  adeqxiate  Intelligibility 
for  both  male  and  female  voices  Is  probably  a  challenging  task. 
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4.  EXPERIMENTATION  WITH  COMBINED  "PEAK-PICKER"  AND 
SPECTRUM  SAMPLING  (NARROW-BAND)  SYSTEM 

One  of  the  simplest  speech  compression  techniques  studied  xmder 
this  contract  was  the  so-called  spectrum  sampling  (narrow-band) 
system.  This  system,  being  researched  and  developed  under  a 
U,  S.  Army  Contract  (DA-36-039-SC-78078)  was  available  for  further 
experimentation  at  Bolt  Beranek  and  Newman  Inc. 

Although,  as  seen  In  Pig.  2,  the  particular  narrow-band  system 
available  for  testing  did  not  perform  as  well  as  might  be  hoped.  It 
provided  a  convenient  and  ready  means  for  studying  how  well  two  speech 
bandwidth  compression  techniques  might  be  combined  to  provide  a 
greater  i*eductlon  In  the  bandwidth  of  processed  speech. 

One  technique  which  should  markedly  Improve  the  performance  of  the 
narrow-band  system  would  be  to  make  one  or  more  of  the  filters  "track" 
those  spectral  peaks  that  are  of  sufficient  relative  amplitude  to 
clearly  distinguish  them  as  vocal  tract  resonances.  For  vowel  sounds, 
these  spectral  peaks  would  be  the  formants  —  hence,  the  title  "for¬ 
mant  tracker . " 

Normally,  each  of  the  six  crystal  filters  In  the  narrow-band  system 
Is  centered  about  a  fixed  location  In  the  lower  side  band  of  a 
suppressed-carrler  modulated  signal,  where  the  modulating  signal  Is 
broad -band  speech  and  the  carrier  is  derived  from  a  crystal -controlled 
oscillator.  To  move  the  filter  about  In  the  audio  spectrum  simply 
reqiiires  a  change  In  the  carrier  frequency,  A  typical  channel  Is  shown 
In  Pig,  5. 

Flgtore  6  Is  a  block  diagram  of  the  control  system  for  one  of  the  six 
narrow-band  system  filters.  The  Input,  a  wide -band  speech  signal.  Is 
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first  bandpass -filtered  to  establish  the  limits  over  which  the  narrow- 
band  system  filter  will  be  made  to  track.  The  resulting  band-limited 
signal  is  fed  to  a  bank  of  five  narrow  filters,  each  of  which  has  an 
associated  detector.  The  detector  outputs  are  fed  to  a  ”peak-plcker, " 
whose  function  It  Is  to  determine  which  of  the  five  narrow  filters 
exhibits  the  largest  output  at  a  given  (clock)  time. 

The  peak-picker  output  Is,  In  the  Interval  between  clock  times,  a 
steady  DC  voltage  proportional  to  the  nmber  (e.g.,  1,  2,  3,  4  or  5) 
of  the  narrow  filter  with,  the  largest  output.  The  peak-picker  output 
Is  completely  independent  of  the  actual  voltages  presented  at  Its 
Inputs  over  a  range  of  Input  voltages  in  excess  of  40  dB. 

The  peak-picker  Is  forced,  whenever  the  level  of  the  band-limited 
speech  exceeds  some  threshold,  to  choose  at  clock  time  the  largest 
of  Its  five  Inputs.  Since  the  clock  rate  is  approximately  400  pps, 
the  peak-picker  output  Is  a  time-varying  voltage  which  may  change 
every  l/400  second,  and  which  can  asstune  one  of  five  discreet  levels. 
The  presence  of  the  lowest  output  level  indicates.  In  general,  the 
presence  of  a  spectral  maximum  located  more  within  the  passband  of 
the  first  narrow  filter  than  any  of  the  other  four.  The  next  higher 
output  level  Indicates  a  (relative)  spectral  maximum  In  the  band 
covered  by  the  second  narrow  filter,  etc. 

The  peak-picker  requires  a  finite  time  In  which  to  make  Its  decision, 
and  so  Its  output  Is  examined  by  means  of  a  sample-and-hold  circuit 
which  Is  strobed  only  after  the  peak-picker  output  has  assximed  a  steady 
value.  Since  the  sampling  command  pulse  disappears  before  the  next 
clock  signal  arrives  at  the  peak-picker,  the  sample-and-hold  output 
contains  none  of  the  ”retvirn-to-zero"  transients  present  In  the  peak- 
picker  output. 
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The  sample-and-hold  output  Is  fed  via  a  smoothing  filter  to  a  voltage- 
controlled  oscillator  (VCO),  which  serves  as  the  modulation  system 
carrier  (in  place  of  the  crystal  oscillator  described  above). 

To  avoid  random  shifting  of  the  VCO  frequency  (l.e.,  of  the  peak- 
picker  output)  when  there  are  no  appreciable  spectral  peaks,  the 
"choose"  command  to  the  peak -picker  (an  Interruption  of  the  B"^  voltage) 
Is  logically  "ANDed"  with  the  output  of  a  threshold  detector.  Since 
the  threshold  detector  measures  the  overall  level  at  the  output  of 
the  band-limiting  filter,  the  peak-picker  Is  not  forced  to  find  a 
spectral  maximum  (which  it  might  do  in  a  random  manner)  unless  the 
probability  Is  high  that  one  really  exists. 

The  decay  time  constant  of  the  sample-and-hold  circuit  Is  adjusted 
so  that.  In  the  absence  of  spectral  maxima  above  threshold,  the  control 
voltage  presented  to  the  VCO  slowly  decays  from  the  last  definite 
value  to  zero. 

"Peak-plcker"  circuits.  Piguu?e  7  is  the  circuit  diagram  of  the  formant 
tracker  designed  to  track,  for  preliminary  exploratory  pxarposes,  a 
narrow-band  system  crystal  filter  over  the  audio  range  from  1000  to 
l800  cps.  Trackers  for  other  frequency  ranges  would  be  Identical 
except  for  component  values  In  the  bandpass  and  narrow  filters. 

The  peak-plcker,  similar  to  a  system  by  Flanagan,*  consists  of  five 
gas  thyratrons  arranged  In  an  "exclusive -OR"  configuration.  The  grid 
of  each  tube  Is  supplied  with  the  sum  of  a  positive-going  ramp  and 


*  J.  L.  Flanagan,  "Automatic  Extraction  of  Formant  Frequencies  from 
Continuous  Speech,"  J.  Acoust.  Soc.  Amer.,  110-118,  1956. 
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the  detected  output  of  one  of  the  narrow  filters.  If  all  tubes  are 
adjusted  for  equal  firing  thresholds,  then  as  the  ramp  rises  the  first 
tube  to  fire  will  be  the  one  whose  associated  narrow  filter  exhibits 
the  largest  output.  Since  all  tubes  share  a  common  plate  load  re¬ 
sistor,  the  firing  of  one  tube  precludes  the  firing  of  any  of  the 
others  until  the  plate  supply  voltage  Is  removed.  This  "choose" 
command  occurs  only  If  the  threshold  detector  Indicates  the  presence 
of  sufficient  energy  In  the  bandpass  filter  to  warrant  making  a  deter¬ 
mination. 

When  a  tube  fires.  Its  plate  current  is  relatively  Independent  of  the 
grid  voltage,  and  so  the  voltage  developed  across  any  cathode  load 
will  be  a  steady  DC  level.  The  wiper  of  the  potentiometer  In  the 
cathode  circuit  of  the  first  tube  Is  adjusted  for,  say,  a  1-volt  DC 
output  when  that  tube  Is  fired,  the  pot  of  the  second  tube  Is  set  for 
2  volts  out,  etc.  All  five  potentiometer  outputs  are  added  by  means 
of  a  diode  adder,  and  since  only  one  tube  fires  at  a  time.  Its  output 
will  be  1,  2,  3,  ^  or  5  volts.  This  quantized  signal  Is  fed  via  a 
cathode  follower  to  the  sample-and-hold  circuit  described  above. 

A  breadboard  model  of  the  peak-picker  described  above  was  made  to 
track,  over  the  frequency  range  1000  to  l800  cps,  the  effective  center 
frequency  of  a  filter  having,  at  the  30  dB  downpolnt?,  a  width  of 
150  cps.  The  filter  was  always  presumably  located  In  the  frequency 
region  having  the  greatest  concentration  of  energy. 

Res\ilts.  Because  of  the  narrow  bandwidth  of  the  single  filter,  the 
speech  signal  was  nearly  unintelligible  and  could  not  be  tested  by  the 
usual  Intelligibility  test  techniques.  However,  It  was  the  Judgment 
of  the  ].lsteners  that  the  output  of  the  filter  was  more  intelligible 
and  more  speech-llke  when  the  filter  tracked  the  "peak"  energy  In  the 
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range  of  1000  to  I8OO  cps  than  when  the  filter  was  fixed  at  any 
location  In  the  same  range. 

The  results  were  deemed  sufficiently  encoviraglng  to  undertake  the 
breadboarding  of  additional  peak-pickers  to  control  the  position  of 
two  other  narrow-band  filters  over  other  frequency  regions.  The 
speech  from  a  three-band  system  should  provide  speech  of  sufficient 
Intelligibility  to  be  measurable.' 

At  this  time  It  became  apparent  that  the  funds  remaining  In  the  contract 
woiild  not  permit  both  the  completion  of  this  combined  peak-picking/ 
narrow-band  system  and  the  Intelligibility  testing  of  other  speech 
compression  systems  being  developed  by  Air  Force  contractors  dvirlng 
the  summer  and  fall  of  I962.  The  latter  tes^  were  Judged  to  be  more 
Important  and  the  experimentation  started  on  the  peak-plcklng/narrow- 
band  system  was  terminated. 

The  peak-picking/narrow-band  system  does  represent  a  somewhat  different 
technique  for  achieving  the  bandwidth  compression  of  speech  than  any 
of  the  other  systems  tested..  It:  Is  our  impression,  however,  from  the 
brief  tests  we  made,  that  this  sytem  has  but  limited  promise  as  a  low 
Information  rate  speech  transmission  system. 
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